Naikai University, UESTC
This paper presents a simple but performant semi-supervised semantic segmentation approach, called CorrMatch. Previous approaches mostly employ complicated training strategies to leverage unlabeled data but overlook the role of correlation maps in modeling the relationships between pairs of locations. We observe that the correlation maps not only enable clustering pixels of the same category easily but also contain good shape information, which previous works have omitted. Motivated by these, we aim to improve the use efficiency of unlabeled data by designing two novel label propagation strategies. First, we propose to conduct pixel propagation by modeling the pairwise similarities of pixels to spread the high-confidence pixels and dig out more. Then, we perform region propagation to enhance the pseudo labels with accurate class-agnostic masks extracted from the correlation maps. CorrMatch achieves great performance on popular segmentation benchmarks. Taking the DeepLabV3+ with ResNet-101 backbone as our segmentation model, we receive a 76%+ mIoU score on the Pascal VOC 2012 dataset with only 92 annotated images. Code is available at BBBBchan/CorrMatch.
Common pratices for semi-supervised segmentation tasks:
CorrMatch is a simpler framework with no need for multiple networks, training stages, or strong augmentation data streams.
challenges:
- the proportion of pseudo labels
- accuracy via threshold adjustments, i.e, dynamically adjust thresholds
Correlations between pixels can reflect the pairwise similarities.
Given some labeled images, and \(N\) unlabeled images with \(K\) classes.
Start from a pseudo label training paradigm, in which \(x_i^w\) is treated as pseudo label for \(x_i^s\):
\[\mathcal{L}_u^h = \frac{1}{N} \sum_i^N \ell_c\left(\mathcal{F}\left(x_i^s\right), \mathcal{F}\left(x_i^w\right)\right).\]
Since \(\mathcal{F}(x_i^w)\) is inaccurate, we keep only those reliable results:
\[\mathcal{L}_u^h = \frac{1}{N} \sum_i^N \ell_c\left(\mathcal{F}\left(x_i^s\right), \mathcal{F}\left(x_i^w\right)\right) \odot \mathcal{M}_i,\]
where \(\mathcal{M}_i = \mathbb{1}\left(\max\left\{\hat{\mathcal{F}}\left(x_i^w\right)\gt \tau\right\}\right)\), \(\tau\) denotes a fixed threshold, \(\odot\), and \(\odot\) stands for element-wise product.